Quantitative Trading Strategy with Python



Kannan Singaravelu, CQF

@kannan_fno | kannansingaravelu.com | github.com/kannansingaravelu


© 2020. All rights reserved.

Agenda

  • Overview of quant workflow
  • Trading strategy workflow
  • Backtesting
  • Performance & risk metrics

About me

  • Data Science Enthusiast
  • Quant Researcher and Derivatives Structurer
  • Alumnus of of Paul Wilmott’s Certificate in Quantitative Finance
  • Alumnus of WorldQuant University’s Applied Data Science
  • B.E. (Mech), M.B.A with more than 16 years of experience in financial industry
  • Strong advocate of Financial Data Science

Quantitative Trading

Quantitative trading consists of trading strategies based on quantitative analysis that rely on mathematics and statistics to identify trading opportunities. Quantitative strategies can be broadly classified into a) Trend Following and b) Mean Reversion.

Almost 85% of CTA returns can be explained by simple trend following. Momentum strategies are by far the most popular systematic strategies used by hedge fund managers. Pair Trading strategies are popular version of statistical arbitrage that employ mean reversion models to capture short term mispricing.

Quantitative strategies involve heavy use of quantitative and computational approaches to trading.

Ideation

While moving average models are the ‘hello world’ of trend following strategies, in the era of machine learning, I try to approach it purely from a quantitative perspective where all signals were based on raw price data and its statistical properties.

The strategy is to buy and sell the Indian equity benchmark Nifty Index when the trend score is above/below a certain threshold.

Data Mining

Data can be mined from Bloomberg, Broker's API, Data Vendors and Refinitiv's Eikon API. Nifty futures 1-Min data from August 2010 to August 2019 was used for this exercise.

Importing Required Libraries

We'll import the required libraries that we'll use.

In [1]:
# Import the pandas library
import pandas as pd

# Import the numpy library
import numpy as np

# Import trend score function from the pyalgo library
from pyalgo.tseries import trend_score

# Import cufflinks library for data visualization
import cufflinks as cf 
cf.set_config_file(offline=True)

# Import financial function library for performance metrics
import ffn

# Ignore warnings
import warnings
warnings.filterwarnings("ignore")

Reading a CSV File

We'll read historical intraday data of NIFTY Index.

In [4]:
# Retrieve Data from text file
df = pd.read_csv('data/NIFTY-I-NEW.txt', sep=',', header=None) 
In [6]:
df.head()
Out[6]:
date time open high low close volume datetime
0 2010-08-16 09:15:59 5453.95 5453.95 5452.5 5452.6 21200.0 2010-08-16 09:15:59
1 2010-08-16 09:16:59 5453.00 5454.00 5452.6 5454.0 21250.0 2010-08-16 09:16:59
2 2010-08-16 09:17:59 5453.90 5454.00 5451.4 5451.4 32750.0 2010-08-16 09:17:59
3 2010-08-16 09:18:59 5451.30 5453.00 5451.0 5453.0 26850.0 2010-08-16 09:18:59
4 2010-08-16 09:19:59 5452.55 5453.95 5451.5 5452.0 22900.0 2010-08-16 09:19:59

Data Wrangling

Now, we will resample the data to 5-min interval and manipulate it to generate trade signals.

In [7]:
# Resample to 5-Min
df = (
    df.set_index('datetime')
    .drop('date',axis=1)
    .drop('time',axis=1)
    .resample('5min')
    .agg({'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volume': 'sum'})
    .dropna()
)
In [8]:
df.head()
Out[8]:
open high volume low close
datetime
2010-08-16 09:15:00 5453.95 5454.00 124950.0 5451.00 5452.0
2010-08-16 09:20:00 5452.20 5454.60 200500.0 5445.30 5447.1
2010-08-16 09:25:00 5447.30 5449.55 178450.0 5444.25 5446.0
2010-08-16 09:30:00 5445.25 5451.00 154250.0 5444.15 5449.3
2010-08-16 09:35:00 5449.15 5455.60 149100.0 5449.05 5454.9
In [9]:
# Plot Index Price
df['close'].iplot(title='Nifty Index from 2010-2019')

Strategy and Trading Rule Definition

We'll define the strategy and rules based on which the trade signals would be generated.

Trend Indicator

Trend score is a measure of how strongly the time series is trending. Consider a time series of length $n$ with ordered components $\{x_i\}$ for $i=1,...,n$

A score between -1 and +1 can be associated to $\{x_i\}$ based on how strongly the time series is trending. Scores near $\pm$1 will be associated with strongly positively/negatively trending series, and scores near 0 will be assigned to series that do not exhibit trending characteristics. Trend Score determines the trading signal based on the statistical strength of the realized return:

$$TrendScore_i \left\{ \begin{array} {ll} +1 &\mbox{if } score>+1 \\ score & otherwise \\ -1 &\mbox{if }score<-1 \end{array} \right.$$

When the absolute value of our trend score is greater than 1, the trend is highly statistically significant, so the strategy puts 100% exposure to the asset.

Trading Rule

Enter (Buy) when the trend score is above a specific value. Exit (Sell) the position when the trend score falls below a specific value. The trend duration is specified by another parameter.

In [10]:
# Apply Trend Score
df['tscore'] = df['close'].rolling(60).apply(trend_score, raw=True).fillna(0)
In [11]:
# Check the last five values
df['tscore'].tail()
Out[11]:
datetime
2019-08-26 15:05:00    1.0
2019-08-26 15:10:00    1.0
2019-08-26 15:15:00    1.0
2019-08-26 15:20:00    1.0
2019-08-26 15:25:00    1.0
Name: tscore, dtype: float64

Validation (backtesting)

Backtesting is a process of applying your trading strategy to historical data and stimulate how they would have performed in the past without actually risking your capital.

Vectorized Backtesting

There are several ways to backtest a strategy on historical data. In this exercise, we'll demonstrate using vectorisation.

Why Vectorize?

The most obvious way to run a backtest over historical data is to create a loop and feed price/signal information one-by-one to a decision engine that determines whether we buy, sell or do nothing based on, say, price/signal action. But, this can be very expensive and can be painfully slow as we were to run 166k data points with different parameters.

Backtesting Assumptions

  1. Backtesting assumes zero friction and trades were fully collateralized (meaning returns were calculated on the notional value of the contract and not on margin deployed for the trade).
  2. Signals were not optimized for this exercise and no short selling was allowed.
  3. Trade was entered at the close price of the period.
  4. The position was held until a sell signal is triggered.
  5. Trade was left untouched and all statistics were recorded between entry and exit levels.

Signal Generation

In [12]:
# Define signal based on trend score
df['signal'] = np.where(df['tscore']==1,1,0)
df['signal'] = np.where(df['tscore']<0.8,0,df['signal'])

# Calculate strategy return 
df['returns'] = (df['signal'].shift(1) * df['close'].pct_change()).fillna(0)

# Construct equity curve
df['eq_curve'] = 1e5 * (1+df['returns']).cumprod()
In [13]:
# Plot Strategy Curve
df[['close', 'eq_curve']].normalize().iplot(title='Strategy Equity Curve')

Performance Statistics

In [14]:
# Get Performance Metrics
perf = df['eq_curve'].calc_stats()
perf.display()
Stats for eq_curve from 2010-08-16 09:15:00 - 2019-08-26 15:25:00
Annual risk-free rate considered: 0.00%
Summary:
Total Return      Sharpe  CAGR    Max Drawdown
--------------  --------  ------  --------------
156.32%             1.64  10.99%  -7.64%

Annualized Returns:
mtd    3m     6m      ytd     1y      3y      5y      10y     incep.
-----  -----  ------  ------  ------  ------  ------  ------  --------
1.46%  2.22%  12.84%  13.48%  17.26%  14.21%  12.20%  10.99%  10.99%

Periodic:
        daily    monthly    yearly
------  -------  ---------  --------
sharpe  1.64     1.73       2.05
mean    10.94%   10.58%     10.02%
vol     6.66%    6.13%      4.89%
skew    1.94     0.44       -1.27
kurt    17.54    0.36       0.24
best    3.77%    6.00%      13.84%
worst   -3.22%   -3.42%     0.60%

Drawdowns:
max     avg       # days
------  ------  --------
-7.64%  -0.75%     22.45

Misc:
---------------  -------
avg. up month    1.79%
avg. down month  -0.87%
up year %        100.00%
12m up %         96.94%
---------------  -------

Drawdown Details

In [16]:
perf.drawdown_details.head()
Out[16]:
Start End Length drawdown
0 2010-08-20 00:00:00 2010-09-01 00:00:00 12 -0.00135477
1 2010-09-08 00:00:00 2010-09-13 00:00:00 5 -0.00145939
2 2010-09-15 00:00:00 2010-09-20 00:00:00 5 -0.00554516
3 2010-09-21 00:00:00 2010-09-27 00:00:00 6 -0.00540358
4 2010-10-14 00:00:00 2010-10-18 00:00:00 4 -0.00235984

Backtesting Results

In [18]:
# Define position to calculate entry and exit levels
df['position'] = df['signal'].shift(1).diff().fillna(0)
df['position'].iloc[0] = 1
In [19]:
# Results dataframe
df1 = pd.concat([
    
    pd.DataFrame({'price': df.loc[df.position == 1,'close'],
                  'position': df.loc[df.position == 1, 'position'],
                 }),
    
    pd.DataFrame({'price': df.loc[df.position == -1,'close'],
                  'position': df.loc[df.position == -1, 'position'],
                 }),
])

df1.sort_index(inplace=True)
df1['return'] = df1['price'].pct_change().shift(-1).fillna(0)
# df1['return'] = df1['return'] - 0.00025
In [20]:
# Record postive trades and returns
s1 = [1 if x > 0 else 0 for x in df1['position']] 
s2 = [1 if x > 0 else 0 for x in df1['return']]
sig = [s1*s2 for s1, s2 in zip(s1,s2)]

ret = [s1*x for s1,x in zip(s1,df1['return'])]
In [21]:
# Output backtest statistics
{'Total Trades': sum(s1), 
 'Positive Trades': sum(sig), 
 'Negative Trades': sum(s1) - sum(sig),
 'Total Returns': round(sum(ret)*100,2),
 'Gr. +ve Returns': round(sum([x if x>0 else 0 for x in ret])*100,2),
 'Gr. -ve Returns': round(sum([x if x<0 else 0 for x in ret])*100,2),
 'Avg. +ve Returns': round(sum([x if x>0 else 0 for x in ret])/(sum(sig))*100,2),
 'Avg. -ve Returns': round(sum([x if x<0 else 0 for x in ret])/(sum(s1) - sum(sig))*100,2),
 'Mean Reward:Risk': round(abs((sum([x if x>0 else 0 for x in ret])/(sum(sig)))/(sum([x if x<0 else 0 for x in ret])/(sum(s1)-sum(sig)))),2),
 'Profit Factor': round(sum([x if x>0 else 0 for x in ret]) / abs(sum([x if x<0 else 0 for x in ret])),2),
 'Win Rate': round(sum(sig)/sum(s1)*100,2),
}
Out[21]:
{'Avg. +ve Returns': 0.23,
 'Avg. -ve Returns': -0.13,
 'Gr. +ve Returns': 298.64,
 'Gr. -ve Returns': -218.59,
 'Mean Reward:Risk': 1.76,
 'Negative Trades': 1704,
 'Positive Trades': 1321,
 'Profit Factor': 1.37,
 'Total Returns': 80.04,
 'Total Trades': 3025,
 'Win Rate': 43.67}

Reward : Risk

Why this strategy make money despite win rate below 50%?

In [22]:
# Reward:Risk
mean_rr = 1.77

# Win Rate
win_rate = 41.23/100

# Expectancy >> positive expentency is a must
(win_rate*mean_rr - (1-win_rate)*1)

# Frequency
frequency = 800
In [23]:
# Simulate PNL for a given RR, WinRate and Frequency

pnl = [0]

for i in range(frequency):
    trade = np.random.choice(['profit']*int(win_rate*100)+['loss']*int((1-win_rate)*100))
    
    if trade == 'profit':
        pnl.append(mean_rr)
        
    elif trade == 'loss':
        pnl.append(-1)
In [24]:
# Plot Stimulated PNL
pd.Series(pnl).cumsum().iplot(title='Simulated Gain for a Specific Reward:Risk')


Thank you !


Kannan Singaravelu
@kannan_fno | kannansingaravelu.com | github.com/kannansingaravelu


© 2020. All rights reserved.